Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 279495 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 35995 |
| Duplicate rows (%) | 12.9% |
| Total size in memory | 29.9 MiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 5 |
| Dataset has 35995 (12.9%) duplicate rows | Duplicates |
product_weight_g is highly correlated with product_volume and 1 other fields | High correlation |
payment_installments is highly correlated with payment_type_encoded | High correlation |
product_volume is highly correlated with product_weight_g | High correlation |
total_payment is highly correlated with product_weight_g | High correlation |
payment_type_encoded is highly correlated with payment_installments | High correlation |
product_weight_g is highly correlated with product_volume | High correlation |
product_volume is highly correlated with product_weight_g | High correlation |
product_weight_g is highly correlated with product_volume | High correlation |
product_volume is highly correlated with product_weight_g | High correlation |
product_weight_g is highly correlated with product_volume | High correlation |
product_volume is highly correlated with product_weight_g | High correlation |
review_score is uniformly distributed | Uniform |
Reproduction
| Analysis started | 2022-04-13 08:11:10.537294 |
|---|---|
| Analysis finished | 2022-04-13 08:12:16.025546 |
| Duration | 1 minute and 5.49 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
order_item_id
Real number (ℝ≥0)
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.027267035 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.1915702753 |
|---|---|
| Coefficient of variation (CV) | 0.1864853721 |
| Kurtosis | 129.3629705 |
| Mean | 1.027267035 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 9.363184054 |
| Sum | 287116 |
| Variance | 0.03669917038 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 272951 | |
| 2 | 5701 | 2.0% |
| 3 | 674 | 0.2% |
| 4 | 137 | < 0.1% |
| 7 | 13 | < 0.1% |
| 5 | 12 | < 0.1% |
| 6 | 7 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 272951 | |
| 2 | 5701 | 2.0% |
| 3 | 674 | 0.2% |
| 4 | 137 | < 0.1% |
| 5 | 12 | < 0.1% |
| 6 | 7 | < 0.1% |
| 7 | 13 | < 0.1% |
| Value | Count | Frequency (%) |
| 7 | 13 | < 0.1% |
| 6 | 7 | < 0.1% |
| 5 | 12 | < 0.1% |
| 4 | 137 | < 0.1% |
| 3 | 674 | 0.2% |
| 2 | 5701 | 2.0% |
| 1 | 272951 |
product_weight_g
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 2170 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2206.175746 |
| Minimum | 0 |
|---|---|
| Maximum | 40425 |
| Zeros | 18 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 125 |
| Q1 | 300 |
| median | 700 |
| Q3 | 1850 |
| 95-th percentile | 10250 |
| Maximum | 40425 |
| Range | 40425 |
| Interquartile range (IQR) | 1550 |
Descriptive statistics
| Standard deviation | 3938.01908 |
|---|---|
| Coefficient of variation (CV) | 1.784997903 |
| Kurtosis | 14.94947563 |
| Mean | 2206.175746 |
| Median Absolute Deviation (MAD) | 500 |
| Skewness | 3.474155103 |
| Sum | 616615090 |
| Variance | 15507994.27 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 200 | 16634 | 6.0% |
| 150 | 12904 | 4.6% |
| 250 | 11404 | 4.1% |
| 300 | 10820 | 3.9% |
| 100 | 8455 | 3.0% |
| 400 | 8052 | 2.9% |
| 350 | 7981 | 2.9% |
| 500 | 6650 | 2.4% |
| 600 | 6564 | 2.3% |
| 700 | 4910 | 1.8% |
| Other values (2160) | 185121 |
| Value | Count | Frequency (%) |
| 0 | 18 | < 0.1% |
| 2 | 15 | < 0.1% |
| 25 | 5 | < 0.1% |
| 50 | 2484 | |
| 53 | 5 | < 0.1% |
| 54 | 1 | < 0.1% |
| 55 | 10 | < 0.1% |
| 58 | 4 | < 0.1% |
| 60 | 22 | < 0.1% |
| 61 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 40425 | 6 | < 0.1% |
| 30000 | 869 | |
| 29800 | 1 | < 0.1% |
| 29750 | 4 | < 0.1% |
| 29700 | 5 | < 0.1% |
| 29600 | 17 | < 0.1% |
| 29500 | 17 | < 0.1% |
| 29250 | 1 | < 0.1% |
| 29150 | 1 | < 0.1% |
| 29100 | 1 | < 0.1% |
| Distinct | 24 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.972443156 |
| Minimum | 0 |
|---|---|
| Maximum | 24 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 4 |
| 95-th percentile | 10 |
| Maximum | 24 |
| Range | 24 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.738672072 |
|---|---|
| Coefficient of variation (CV) | 0.9213538925 |
| Kurtosis | 2.567251957 |
| Mean | 2.972443156 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.602498653 |
| Sum | 830783 |
| Variance | 7.50032472 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 133203 | |
| 2 | 34537 | 12.4% |
| 3 | 30267 | 10.8% |
| 4 | 20221 | 7.2% |
| 10 | 15403 | 5.5% |
| 5 | 15365 | 5.5% |
| 8 | 12198 | 4.4% |
| 6 | 10902 | 3.9% |
| 7 | 4673 | 1.7% |
| 9 | 1745 | 0.6% |
| Other values (14) | 981 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 2 | < 0.1% |
| 1 | 133203 | |
| 2 | 34537 | 12.4% |
| 3 | 30267 | 10.8% |
| 4 | 20221 | 7.2% |
| 5 | 15365 | 5.5% |
| 6 | 10902 | 3.9% |
| 7 | 4673 | 1.7% |
| 8 | 12198 | 4.4% |
| 9 | 1745 | 0.6% |
| Value | Count | Frequency (%) |
| 24 | 81 | < 0.1% |
| 23 | 7 | < 0.1% |
| 22 | 4 | < 0.1% |
| 21 | 10 | < 0.1% |
| 20 | 43 | < 0.1% |
| 18 | 77 | < 0.1% |
| 17 | 25 | < 0.1% |
| 16 | 14 | < 0.1% |
| 15 | 212 | |
| 14 | 26 | < 0.1% |
order_delivered_customer_time_in_days
Real number (ℝ≥0)
| Distinct | 141 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.65555377 |
| Minimum | 0 |
|---|---|
| Maximum | 208 |
| Zeros | 34 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 7 |
| median | 12 |
| Q3 | 20 |
| 95-th percentile | 36 |
| Maximum | 208 |
| Range | 208 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 11.30220541 |
|---|---|
| Coefficient of variation (CV) | 0.7711892425 |
| Kurtosis | 23.24753891 |
| Mean | 14.65555377 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 2.908677768 |
| Sum | 4096154 |
| Variance | 127.7398471 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 18318 | 6.6% |
| 8 | 16401 | 5.9% |
| 6 | 16049 | 5.7% |
| 9 | 15054 | 5.4% |
| 10 | 14124 | 5.1% |
| 5 | 13776 | 4.9% |
| 11 | 13677 | 4.9% |
| 12 | 11908 | 4.3% |
| 13 | 11680 | 4.2% |
| 4 | 10781 | 3.9% |
| Other values (131) | 137727 |
| Value | Count | Frequency (%) |
| 0 | 34 | < 0.1% |
| 1 | 3282 | 1.2% |
| 2 | 6966 | 2.5% |
| 3 | 8713 | |
| 4 | 10781 | |
| 5 | 13776 | |
| 6 | 16049 | |
| 7 | 18318 | |
| 8 | 16401 | |
| 9 | 15054 |
| Value | Count | Frequency (%) |
| 208 | 10 | |
| 195 | 1 | < 0.1% |
| 194 | 10 | |
| 191 | 5 | |
| 189 | 9 | |
| 188 | 1 | < 0.1% |
| 187 | 10 | |
| 182 | 1 | < 0.1% |
| 181 | 4 | < 0.1% |
| 175 | 3 | < 0.1% |
product_volume
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 4410 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15856.43936 |
| Minimum | 168 |
|---|---|
| Maximum | 296208 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 168 |
|---|---|
| 5-th percentile | 816 |
| Q1 | 2816 |
| median | 6720 |
| Q3 | 19344 |
| 95-th percentile | 60000 |
| Maximum | 296208 |
| Range | 296040 |
| Interquartile range (IQR) | 16528 |
Descriptive statistics
| Standard deviation | 24478.31361 |
|---|---|
| Coefficient of variation (CV) | 1.543745923 |
| Kurtosis | 23.40630858 |
| Mean | 15856.43936 |
| Median Absolute Deviation (MAD) | 4952 |
| Skewness | 3.943044266 |
| Sum | 4431795518 |
| Variance | 599187837.3 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8000 | 6634 | 2.4% |
| 352 | 4671 | 1.7% |
| 640 | 3298 | 1.2% |
| 816 | 3282 | 1.2% |
| 4096 | 3005 | 1.1% |
| 23625 | 2570 | 0.9% |
| 19800 | 2480 | 0.9% |
| 27000 | 2366 | 0.8% |
| 20000 | 2296 | 0.8% |
| 4800 | 2277 | 0.8% |
| Other values (4400) | 246616 |
| Value | Count | Frequency (%) |
| 168 | 1 | < 0.1% |
| 288 | 1 | < 0.1% |
| 352 | 4671 | |
| 374 | 10 | < 0.1% |
| 378 | 1 | < 0.1% |
| 384 | 51 | < 0.1% |
| 396 | 29 | < 0.1% |
| 408 | 2 | < 0.1% |
| 416 | 13 | < 0.1% |
| 418 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 296208 | 5 | < 0.1% |
| 294000 | 23 | |
| 293706 | 1 | < 0.1% |
| 288000 | 9 | < 0.1% |
| 287980 | 1 | < 0.1% |
| 285138 | 1 | < 0.1% |
| 282750 | 5 | < 0.1% |
| 281232 | 1 | < 0.1% |
| 277550 | 2 | < 0.1% |
| 274625 | 6 | < 0.1% |
customer_seller_distance
Real number (ℝ≥0)
| Distinct | 2914 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 631.45028 |
| Minimum | 0 |
|---|---|
| Maximum | 8736 |
| Zeros | 111 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 17 |
| Q1 | 226 |
| median | 455 |
| Q3 | 822 |
| 95-th percentile | 2131 |
| Maximum | 8736 |
| Range | 8736 |
| Interquartile range (IQR) | 596 |
Descriptive statistics
| Standard deviation | 614.339412 |
|---|---|
| Coefficient of variation (CV) | 0.9729022719 |
| Kurtosis | 2.823946982 |
| Mean | 631.45028 |
| Median Absolute Deviation (MAD) | 301 |
| Skewness | 1.630010805 |
| Sum | 176487196 |
| Variance | 377412.9131 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 14 | 1080 | 0.4% |
| 16 | 1078 | 0.4% |
| 13 | 1050 | 0.4% |
| 15 | 1024 | 0.4% |
| 17 | 1017 | 0.4% |
| 18 | 995 | 0.4% |
| 11 | 985 | 0.4% |
| 10 | 978 | 0.3% |
| 20 | 970 | 0.3% |
| 23 | 954 | 0.3% |
| Other values (2904) | 269364 |
| Value | Count | Frequency (%) |
| 0 | 111 | < 0.1% |
| 1 | 267 | 0.1% |
| 2 | 439 | |
| 3 | 495 | |
| 4 | 627 | |
| 5 | 629 | |
| 6 | 723 | |
| 7 | 818 | |
| 8 | 937 | |
| 9 | 805 |
| Value | Count | Frequency (%) |
| 8736 | 2 | < 0.1% |
| 8677 | 1 | < 0.1% |
| 8025 | 5 | |
| 7963 | 1 | < 0.1% |
| 3577 | 6 | |
| 3397 | 1 | < 0.1% |
| 3385 | 8 | |
| 3381 | 2 | < 0.1% |
| 3378 | 1 | < 0.1% |
| 3357 | 1 | < 0.1% |
order_purchase_year
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| 2018 | |
|---|---|
| 2017 | |
| 2016 | 945 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2018 |
|---|---|
| 2nd row | 2018 |
| 3rd row | 2017 |
| 4th row | 2018 |
| 5th row | 2017 |
Common Values
| Value | Count | Frequency (%) |
| 2018 | 152714 | |
| 2017 | 125836 | |
| 2016 | 945 | 0.3% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 2018 | 152714 | |
| 2017 | 125836 | |
| 2016 | 945 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 1515 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 145.0557506 |
| Minimum | 7 |
|---|---|
| Maximum | 6929 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 31 |
| Q1 | 57 |
| median | 96 |
| Q3 | 163 |
| 95-th percentile | 397 |
| Maximum | 6929 |
| Range | 6922 |
| Interquartile range (IQR) | 106 |
Descriptive statistics
| Standard deviation | 193.2919372 |
|---|---|
| Coefficient of variation (CV) | 1.332535501 |
| Kurtosis | 79.42869442 |
| Mean | 145.0557506 |
| Median Absolute Deviation (MAD) | 47 |
| Skewness | 6.746707328 |
| Sum | 40542357 |
| Variance | 37361.773 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 35 | 2752 | 1.0% |
| 37 | 2748 | 1.0% |
| 65 | 2653 | 0.9% |
| 45 | 2631 | 0.9% |
| 47 | 2494 | 0.9% |
| 64 | 2463 | 0.9% |
| 36 | 2432 | 0.9% |
| 77 | 2385 | 0.9% |
| 66 | 2381 | 0.9% |
| 57 | 2314 | 0.8% |
| Other values (1505) | 254242 |
| Value | Count | Frequency (%) |
| 7 | 17 | < 0.1% |
| 9 | 5 | < 0.1% |
| 10 | 8 | < 0.1% |
| 11 | 24 | < 0.1% |
| 12 | 8 | < 0.1% |
| 13 | 84 | < 0.1% |
| 14 | 156 | |
| 15 | 73 | < 0.1% |
| 16 | 133 | |
| 17 | 284 |
| Value | Count | Frequency (%) |
| 6929 | 1 | < 0.1% |
| 6726 | 1 | < 0.1% |
| 4950 | 1 | < 0.1% |
| 4764 | 1 | < 0.1% |
| 4681 | 1 | < 0.1% |
| 4513 | 1 | < 0.1% |
| 4194 | 19 | |
| 4175 | 1 | < 0.1% |
| 4034 | 1 | < 0.1% |
| 4016 | 1 | < 0.1% |
order_status_encoded
Real number (ℝ≥0)
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 90131.9164 |
| Minimum | 2 |
|---|---|
| Maximum | 94019 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 94019 |
| Q1 | 94019 |
| median | 94019 |
| Q3 | 94019 |
| 95-th percentile | 94019 |
| Maximum | 94019 |
| Range | 94017 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 18648.32689 |
|---|---|
| Coefficient of variation (CV) | 0.2069003704 |
| Kurtosis | 19.06078671 |
| Mean | 90131.9164 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -4.589152056 |
| Sum | 2.519141997 × 1010 |
| Variance | 347760095.8 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 94019 | 267857 | |
| 1002 | 5737 | 2.1% |
| 426 | 2371 | 0.8% |
| 298 | 1790 | 0.6% |
| 283 | 1698 | 0.6% |
| 6 | 33 | < 0.1% |
| 2 | 9 | < 0.1% |
| Value | Count | Frequency (%) |
| 2 | 9 | < 0.1% |
| 6 | 33 | < 0.1% |
| 283 | 1698 | 0.6% |
| 298 | 1790 | 0.6% |
| 426 | 2371 | 0.8% |
| 1002 | 5737 | 2.1% |
| 94019 | 267857 |
| Value | Count | Frequency (%) |
| 94019 | 267857 | |
| 1002 | 5737 | 2.1% |
| 426 | 2371 | 0.8% |
| 298 | 1790 | 0.6% |
| 283 | 1698 | 0.6% |
| 6 | 33 | < 0.1% |
| 2 | 9 | < 0.1% |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| 72878 | |
|---|---|
| 19111 | |
| 2566 | 7851 |
| 1481 | 4115 |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.95718707 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 72878 |
|---|---|
| 2nd row | 72878 |
| 3rd row | 2566 |
| 4th row | 72878 |
| 5th row | 72878 |
Common Values
| Value | Count | Frequency (%) |
| 72878 | 211698 | |
| 19111 | 55831 | 20.0% |
| 2566 | 7851 | 2.8% |
| 1481 | 4115 | 1.5% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 72878 | 211698 | |
| 19111 | 55831 | 20.0% |
| 2566 | 7851 | 2.8% |
| 1481 | 4115 | 1.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
product_category_name_english_encoded
Real number (ℝ≥0)
| Distinct | 68 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5035.5705 |
| Minimum | 2 |
|---|---|
| Maximum | 9217 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.1 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 343 |
| Q1 | 3124 |
| median | 5540 |
| Q3 | 7622 |
| 95-th percentile | 9217 |
| Maximum | 9217 |
| Range | 9215 |
| Interquartile range (IQR) | 4498 |
Descriptive statistics
| Standard deviation | 2810.698341 |
|---|---|
| Coefficient of variation (CV) | 0.5581687994 |
| Kurtosis | -1.121375352 |
| Mean | 5035.5705 |
| Median Absolute Deviation (MAD) | 2082 |
| Skewness | -0.08453743093 |
| Sum | 1407416777 |
| Variance | 7900025.162 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9217 | 29992 | 10.7% |
| 8705 | 23616 | 8.4% |
| 7622 | 20618 | 7.4% |
| 6610 | 20059 | 7.2% |
| 6295 | 19501 | 7.0% |
| 5540 | 16650 | 6.0% |
| 5769 | 16650 | 6.0% |
| 4132 | 13287 | 4.8% |
| 3849 | 11118 | 4.0% |
| 3788 | 10098 | 3.6% |
| Other values (58) | 97906 |
| Value | Count | Frequency (%) |
| 2 | 5 | < 0.1% |
| 6 | 11 | < 0.1% |
| 11 | 17 | < 0.1% |
| 12 | 17 | < 0.1% |
| 21 | 50 | < 0.1% |
| 23 | 75 | < 0.1% |
| 24 | 44 | < 0.1% |
| 27 | 154 | |
| 38 | 196 | |
| 39 | 293 |
| Value | Count | Frequency (%) |
| 9217 | 29992 | |
| 8705 | 23616 | |
| 7622 | 20618 | |
| 6610 | 20059 | |
| 6295 | 19501 | |
| 5769 | 16650 | |
| 5540 | 16650 | |
| 4132 | 13287 | |
| 3849 | 11118 | 4.0% |
| 3788 | 10098 | 3.6% |
timing_encoded
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| 37046 | |
|---|---|
| 32966 | |
| 21463 | |
| 4561 |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.951838137 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 21463 |
|---|---|
| 2nd row | 32966 |
| 3rd row | 21463 |
| 4th row | 32966 |
| 5th row | 21463 |
Common Values
| Value | Count | Frequency (%) |
| 37046 | 107796 | |
| 32966 | 96853 | |
| 21463 | 61385 | |
| 4561 | 13461 | 4.8% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 37046 | 107796 | |
| 32966 | 96853 | |
| 21463 | 61385 | |
| 4561 | 13461 | 4.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Seasons_encoded
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| 28526 | |
|---|---|
| 18601 | |
| 20562 | |
| 19472 | |
| 8875 |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.911154046 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 28526 |
|---|---|
| 2nd row | 20562 |
| 3rd row | 19472 |
| 4th row | 19472 |
| 5th row | 28526 |
Common Values
| Value | Count | Frequency (%) |
| 28526 | 89733 | |
| 18601 | 57780 | |
| 20562 | 54165 | |
| 19472 | 52985 | |
| 8875 | 24832 | 8.9% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 28526 | 89733 | |
| 18601 | 57780 | |
| 20562 | 54165 | |
| 19472 | 52985 | |
| 8875 | 24832 | 8.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| 1 | |
|---|---|
| 2 | |
| 3 | |
| 4 | |
| 5 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 55899 | |
| 2 | 55899 | |
| 3 | 55899 | |
| 4 | 55899 | |
| 5 | 55899 |
Length
Pie chart
| Value | Count | Frequency (%) |
| 1 | 55899 | |
| 2 | 55899 | |
| 3 | 55899 | |
| 4 | 55899 | |
| 5 | 55899 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| order_item_id | product_weight_g | payment_installments | order_delivered_customer_time_in_days | product_volume | customer_seller_distance | order_purchase_year | total_payment | order_status_encoded | payment_type_encoded | product_category_name_english_encoded | timing_encoded | Seasons_encoded | review_score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1475 | 8 | 16 | 12540 | 862 | 2018 | 92 | 94019 | 72878 | 9217 | 21463 | 28526 | 1 |
| 1 | 1 | 300 | 1 | 2 | 352 | 52 | 2018 | 157 | 94019 | 72878 | 5540 | 32966 | 20562 | 1 |
| 2 | 1 | 400 | 1 | 13 | 5967 | 1513 | 2017 | 27 | 94019 | 2566 | 757 | 21463 | 19472 | 1 |
| 3 | 1 | 2950 | 1 | 10 | 31939 | 55 | 2018 | 65 | 94019 | 72878 | 3849 | 32966 | 19472 | 1 |
| 4 | 1 | 1500 | 10 | 18 | 20000 | 17 | 2017 | 101 | 94019 | 72878 | 3788 | 21463 | 28526 | 1 |
| 5 | 1 | 200 | 1 | 11 | 4410 | 527 | 2018 | 139 | 94019 | 72878 | 6610 | 32966 | 19472 | 1 |
| 6 | 1 | 23900 | 3 | 14 | 182952 | 324 | 2017 | 260 | 94019 | 72878 | 6295 | 37046 | 18601 | 1 |
| 7 | 1 | 163 | 1 | 7 | 3528 | 38 | 2017 | 97 | 94019 | 19111 | 7622 | 21463 | 8875 | 1 |
| 8 | 1 | 1050 | 8 | 16 | 5610 | 603 | 2017 | 115 | 94019 | 72878 | 8705 | 37046 | 20562 | 1 |
| 9 | 1 | 2750 | 1 | 46 | 13475 | 325 | 2018 | 301 | 94019 | 19111 | 6295 | 37046 | 18601 | 1 |
Last rows
| order_item_id | product_weight_g | payment_installments | order_delivered_customer_time_in_days | product_volume | customer_seller_distance | order_purchase_year | total_payment | order_status_encoded | payment_type_encoded | product_category_name_english_encoded | timing_encoded | Seasons_encoded | review_score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 279485 | 1 | 700 | 1 | 8 | 13167 | 27 | 2018 | 84 | 94019 | 1481 | 8705 | 4561 | 20562 | 5 |
| 279486 | 1 | 2700 | 6 | 24 | 6912 | 1760 | 2017 | 185 | 94019 | 72878 | 8705 | 37046 | 20562 | 5 |
| 279487 | 1 | 600 | 2 | 6 | 3220 | 313 | 2018 | 120 | 94019 | 72878 | 161 | 32966 | 20562 | 5 |
| 279488 | 1 | 4403 | 10 | 6 | 56160 | 238 | 2018 | 199 | 94019 | 72878 | 1019 | 32966 | 20562 | 5 |
| 279489 | 1 | 850 | 10 | 14 | 15625 | 1006 | 2018 | 151 | 94019 | 72878 | 3849 | 21463 | 20562 | 5 |
| 279490 | 1 | 500 | 8 | 6 | 12000 | 900 | 2018 | 984 | 94019 | 72878 | 5540 | 37046 | 20562 | 5 |
| 279491 | 1 | 2150 | 6 | 6 | 29792 | 717 | 2018 | 359 | 94019 | 72878 | 3849 | 21463 | 19472 | 5 |
| 279492 | 1 | 550 | 10 | 6 | 4320 | 934 | 2018 | 105 | 94019 | 72878 | 3849 | 32966 | 20562 | 5 |
| 279493 | 1 | 5250 | 1 | 16 | 31280 | 533 | 2018 | 219 | 94019 | 72878 | 1019 | 21463 | 18601 | 5 |
| 279494 | 1 | 900 | 7 | 18 | 2418 | 553 | 2017 | 35 | 94019 | 72878 | 3459 | 32966 | 18601 | 5 |
Most frequently occurring
| order_item_id | product_weight_g | payment_installments | order_delivered_customer_time_in_days | product_volume | customer_seller_distance | order_purchase_year | total_payment | order_status_encoded | payment_type_encoded | product_category_name_english_encoded | timing_encoded | Seasons_encoded | review_score | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 23957 | 1 | 1500 | 1 | 5 | 19760 | 70 | 2018 | 56 | 94019 | 72878 | 757 | 32966 | 18601 | 2 | 38 |
| 10121 | 1 | 325 | 1 | 19 | 4200 | 403 | 2018 | 48 | 94019 | 19111 | 4132 | 21463 | 18601 | 2 | 34 |
| 8492 | 1 | 280 | 1 | 26 | 5746 | 794 | 2018 | 45 | 94019 | 19111 | 7622 | 32966 | 28526 | 2 | 33 |
| 20647 | 1 | 1000 | 2 | 17 | 5250 | 1419 | 2017 | 29 | 94019 | 72878 | 5769 | 37046 | 28526 | 2 | 33 |
| 31629 | 1 | 6600 | 3 | 8 | 39375 | 455 | 2017 | 66 | 94019 | 72878 | 2815 | 37046 | 8875 | 2 | 33 |
| 335 | 1 | 67 | 2 | 11 | 3570 | 332 | 2018 | 25 | 94019 | 72878 | 8705 | 32966 | 18601 | 2 | 32 |
| 611 | 1 | 100 | 1 | 6 | 2208 | 86 | 2018 | 32 | 94019 | 72878 | 2815 | 32966 | 28526 | 2 | 32 |
| 5733 | 1 | 200 | 2 | 18 | 2816 | 263 | 2018 | 75 | 94019 | 72878 | 7622 | 4561 | 28526 | 2 | 32 |
| 6957 | 1 | 250 | 1 | 5 | 2700 | 25 | 2017 | 22 | 94019 | 19111 | 343 | 21463 | 28526 | 2 | 32 |
| 9795 | 1 | 300 | 5 | 22 | 1936 | 2035 | 2018 | 175 | 94019 | 72878 | 5540 | 4561 | 28526 | 2 | 32 |